Domain-Adaptive Discriminative One-Shot Learning of Gestures
نویسندگان
چکیده
The objective of this paper is to recognize gestures in videos – both localizing the gesture and classifying it into one of multiple classes. We show that the performance of a gesture classifier learnt from a single (strongly supervised) training example can be boosted significantly using a ‘reservoir’ of weakly supervised gesture examples (and that the performance exceeds learning from the one-shot example or reservoir alone). The one-shot example and weakly supervised reservoir are from different ‘domains’ (different people, different videos, continuous or noncontinuous gesturing, etc.), and we propose a domain adaptation method for human pose and hand shape that enables gesture learning methods to generalise between them. We also show the benefits of using the recently introduced Global Alignment Kernel [12], instead of the standard Dynamic Time Warping that is generally used for time alignment. The domain adaptation and learning methods are evaluated on two large scale challenging gesture datasets: one for sign language, and the other for Italian hand gestures. In both cases performance exceeds the previous published results, including the best skeleton-classification-only entry in the 2013 ChaLearn challenge.
منابع مشابه
Adaptive Local Spatiotemporal Features from RGB-D Data for One-Shot Learning Gesture Recognition
Noise and constant empirical motion constraints affect the extraction of distinctive spatiotemporal features from one or a few samples per gesture class. To tackle these problems, an adaptive local spatiotemporal feature (ALSTF) using fused RGB-D data is proposed. First, motion regions of interest (MRoIs) are adaptively extracted using grayscale and depth velocity variance information to greatl...
متن کاملKeep It Simple and Sparse: Real-Time Action Recognition
Sparsity has been showed to be one of the most important properties for visual recognition purposes. In this paper we show that sparse representation plays a fundamental role in achieving one-shot learning and real-time recognition of actions. We start off from RGBD images, combine motion and appearance cues and extract state-of-the-art features in a computationally efficient way. The proposed ...
متن کاملLearning feed-forward one-shot learners
One-shot learning is usually tackled by using generative models or discriminative embeddings. Discriminative methods based on deep learning, which are very effective in other learning scenarios, are ill-suited for one-shot learning as they need large amounts of training data. In this paper, we propose a method to learn the parameters of a deep model in one shot. We construct the learner as a se...
متن کاملZero-Shot Fine-Grained Classification by Deep Feature Learning with Semantics
Fine-grained image classification, which aims to distinguish images with subtle distinctions, is a challenging task due to two main issues: lack of sufficient training data for every class and difficulty in learning discriminative features for representation. In this paper, to address the two issues, we propose a two-phase framework for recognizing images from unseen fine-grained classes, i.e. ...
متن کاملRecognizing Unfamiliar Gestures for Human-Robot Interaction Through Zero-Shot Learning
Human communication is highly multimodal, including speech, gesture, gaze, facial expressions, and body language. Robots serving as human teammates must act on such multimodal communicative inputs from humans, even when the message may not be clear from any single modality. In this paper, we explore a method for achieving increased understanding of complex, situated communications by leveraging...
متن کامل